June 7, 2018

A client wants you to predict data scientist salaries with machine learning.

Let's predict data scientist salaries

What is Machine Learning?

Step 1: Find some data

Step 1: Find some data

Kaggle conducted an industry-wide survey of data scientists. https://www.kaggle.com/kaggle/kaggle-survey-2017

Information asked:

  • Compensation
  • Demographics
  • Job title
  • Experience

Contains information from Kaggle ML and Data Science Survey, 2017, which is made available here under the Open Database License (ODbL).

Step 2: Throw ML on your data

Step 3: Profit

Client: "There is a problem with the model!"

"What problem?"

Client: "The older the candidates, the higher the predicted salaries."

Looking inside the black box

How do the features influence my predictions?

Partial Dependence Plot

Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2013). Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation, 1–22. https://doi.org/10.1080/10618600.2014.907095

Friedman, J. H. (1999). Greedy Function Approximation : A Gradient Boosting Machine. North, 1(3), 1–10. https://doi.org/10.2307/2699986

Client: "We want to understand the model better!"

What are the most important features?

Permutation feature importance

Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.

Gender?!

What's the influence of gender on the prediction?

Gender .y.hat
Male 50564.82
Female 47168.67
Non-binary, genderqueer, or gender non-conforming 49904.63
A different identity 49589.76

Explaining individual predictions

Local Models (LIME)

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Retrieved from http://arxiv.org/abs/1602.04938

Some Theory

The Problem

When do we need interpretability?

When do we NOT need interpretability?

What tools do we have?

Interpretable Models

Interpretable Models

Model-specific methods

Model-specific methods

Model-agnostic methods

Model-agnostic methods

Example-based methods

Example-focused methods

Interested in learning more?

Backup slides

What is interpretability?

Statistical Modeling: The Two Cultures

What makes an explanation human-friendly?

Shapley Value